Table of Contents

Load the data

Memory Reduction

data type: datetetime

data type: object

Hidden Continuous features

Create New Columns

Missing Values

fill nans with mean

fill nans with another column

fill nans with zero

Dummy variables

Split train test data

Prepare the data for train and test separately

Weight of Evidence WoE and Information Value IV

woe dummies: grade

woe dummies: home_ownership

woe dummies: addr_state

woe dummies: verification_status

woe dummies: purpose

woe dummies: initial_list_status

woe dummies: term_int

woe dummies continuous: emp_length_int

woe dummies continuous: mths_since_issue_d

woe dummies continuous: int_rate

woe dummies continuous: funded_amnt **

woe dummies continuous: mths_since_earliest_cr_line

woe dummies continuous: delinq_2yrs **

woe dummies continuous: inq_last_6mths

woe dummies continuous: open_acc

woe dummies continuous: pub_rec

woe dummies continuous: total_acc

woe dummies continuous: acc_now_delinq

woe dummies continuous: total_rev_hi_lim

woe dummies continuous: installment

woe dummies continuous: annual_inc

woe dummies continuous: mths_since_last_delinq **

woe dummies continuous: dti **

woe dummies continuous: mths_since_last_record ** missing

Save the train data and repeat for test